3 research outputs found
All-rounder: A flexible DNN accelerator with diverse data format support
Recognizing the explosive increase in the use of DNN-based applications,
several industrial companies developed a custom ASIC (e.g., Google TPU, IBM
RaPiD, Intel NNP-I/NNP-T) and constructed a hyperscale cloud infrastructure
with it. The ASIC performs operations of the inference or training process of
DNN models which are requested by users. Since the DNN models have different
data formats and types of operations, the ASIC needs to support diverse data
formats and generality for the operations. However, the conventional ASICs do
not fulfill these requirements. To overcome the limitations of it, we propose a
flexible DNN accelerator called All-rounder. The accelerator is designed with
an area-efficient multiplier supporting multiple precisions of integer and
floating point datatypes. In addition, it constitutes a flexibly fusible and
fissionable MAC array to support various types of DNN operations efficiently.
We implemented the register transfer level (RTL) design using Verilog and
synthesized it in 28nm CMOS technology. To examine practical effectiveness of
our proposed designs, we designed two multiply units and three state-of-the-art
DNN accelerators. We compare our multiplier with the multiply units and perform
architectural evaluation on performance and energy efficiency with eight
real-world DNN models. Furthermore, we compare benefits of the All-rounder
accelerator to a high-end GPU card, i.e., NVIDIA GeForce RTX30390. The proposed
All-rounder accelerator universally has speedup and high energy efficiency in
various DNN benchmarks than the baselines
ํฌ์ ํ๋ ฌ ๊ณฑ์ ๊ฐ์๊ธฐ์ ํ๋์จ์ด ํฅ์์ ๋ํ์ฌ
SpGEMM, Distribution Network, Reduction Network, Data tilingDeep learning is being used and researched in various industries such as image processing, natural lan-guage processing, and recommendation algorithm service. Also, The size of the model is growing in tandem with deep learning technologies to increase accuracy. Additionally, sparse matrix multiplication is used in the majority of deep learning model operations. As a result, there is an increasing needs for accelerator research on sparse matrix multiplication. One of the accelerators that supports the sparse general matrix-matrix multi-plication (spGEMM) operation is SIGMA (A Sparse and Irregular GEMM Accelerator). However, each opera-tion network and index matching process of SIGMA is inefficient. We propose improvement measures in three aspects to solve these problems. First, the distribution network's redundant hardware modules are elimi-nated. When multiple flexdpe's are controlled by NoC (Network on chip), area and power can be by utilizing the use of a network where unnecessary parts are removed. Second, we suggest a brand-new architecture that solely uses the output flip-flop to store and compute the total of the partial sums of reduction networks. Fi-nally, we suggest that for quick operation processing, the sparsity of each matrix, the number of operation elements, and the matrix size be used as indicators for choosing an efficient partitioning approach utilizing a pre-calculated table as a look-up table. The total hardware area was decreased by roughly 21.8% and the power was decreased by 37.5% thanks to the proposed distribution and reduction network structure en-hancement. When using the LUT and tiling with 2, it is possible to reduce the clock cycle by around 80% when the stationary matrix's sparseness is 80% and the streaming matrix's sparseness is 99%.๋ฅ ๋ฌ๋(Deep Learning)์ ์ด๋ฏธ์ง ์ฒ๋ฆฌ, ์์ฐ์ด ์ฒ๋ฆฌ, ์ถ์ฒ ์๊ณ ๋ฆฌ์ฆ ์๋น์ค ๋ฑ ๋ค์ํ ์ฐ์
๋ถ์ผ์์ ํ์ฉ ๋ฐ ์ฐ๊ตฌ๋๊ณ ์๋ค. ๋ํ ๋ฅ ๋ฌ๋ ๋ชจ๋ธ์ ์ ํ๋ ํฅ์์ ์ํด ๋ชจ๋ธ์ด ํฌ๊ธฐ๋ ์ฆ๊ฐํ๊ณ ์๋ค. ๋ฅ ๋ฌ๋ ๋ชจ๋ธ์์ ๋๋ถ๋ถ์ ์ฐ์ฐ์ ํฌ์ ํ๋ ฌ ๊ณฑ์
์ด ์ฐจ์งํ๋ค. ๋ฐ๋ผ์ ํฌ์ํ๋ ฌ ๊ณฑ์
์ ๋ํ ๊ฐ์๊ธฐ ์ฐ๊ตฌ์ ํ์์ฑ์ด ์ปค์ง๊ณ ์๋ค. ์ฐ๋ฆฌ๋ ํฌ์ํ๋ ฌ ๊ณฑ์
์ฐ์ฐ์ ์ง์ํ๋ ๊ฐ์๊ธฐ ์ค ํ๋์ธ SIGMA(A Sparse and Irregular GEMM Accelerator)์ 3๊ฐ์ง ์ธก๋ฉด์์ ๊ฐ์ ๋ฐฉ์์ ์ ์ํ๋ค. ์ฒซ์งธ, distribution network์ ๋ถํ์ํ ํ๋์จ์ด ๊ตฌ์ฑ ์์๋ฅผ ์ ๊ฑฐํ๋ค. ๋์งธ, reduction network ์ partial sum์ ํฉ์ output flip-flop๋ง์ ์ฌ์ฉํ์ฌ ์ ์ฅํ๊ณ ์ฐ์ฐํ๋ ์๋ก์ด topology ๋ฅผ ์ ์ํ๋ค. ๋ง์ง๋ง์ผ๋ก ๋น ๋ฅธ ์ฐ์ฐ ์ฒ๋ฆฌ๋ฅผ ์ํด ๊ฐ ํ๋ ฌ์ ํฌ์๋, ์ฐ์ฐ ์์ ๊ฐ์, ํ๋ ฌ ํฌ๊ธฐ๋ฅผ ์งํ๋ก ํ์ฌ ๋ฏธ๋ฆฌ ๊ณ์ฐํ table์ Look Up Table๋ก ํ์ฉํ์ฌ ํจ์จ์ ์ธ ๋ถํ ๋ฐฉ๋ฒ์ ์ ํํ๋ ๊ฒ์ ์ ์ํ๋ค. ์ ์ํ distri-bution, reduction network ๊ตฌ์กฐ ๊ฐ์ ์ ํตํด ์ ์ฒด ํ๋์จ์ด ๋ฉด์ ์ ์ฝ 21.8% ๊ฐ ๊ฐ์ํ์๊ณ ์ ๋ ฅ์ 37.5%๊ฐ ๊ฐ์ํ์๋ค. Stationary matrix์ ํฌ์๋๊ฐ 80%, streaming matrix์ ํฌ์๋๊ฐ 99%์ผ ๋ LUT๋ฅผ ๋ณด๊ณ 2๋ก tilingํ ๊ฒฝ์ฐ ์ฝ 80%์ clock cycle์ ์ค์ผ ์ ์๋ค.โ
. Introduction 1
โ
ก. Background and Prior Work 4
2.1 Background 4
2.1.1 Multi-layer Perceptron (MLP) 4
2.1.2 Convolutional Neural Networks (CNN) 5
2.1.3 Transformer 6
2.2 Prior Works: Inner, Outer, Row-wise Product Based Accelerators 7
2.3 Prior Work: SIGMA 8
2.3.1 Dataflow Of SIGMA 8
2.3.2 Distribution Network 11
2.3.3 Reduction Network 12
โ
ข. Proposed Sparse Accelerator Design 13
3.1 Distribution Network 13
3.2 Reduction Network 15
3.2.1 Reorganized adder tree 15
3.3 Data Tiling Strategy 16
โ
ฃ. Evaluation 17
4.1 Methodology 17
4.2 Experimental Results 17
4.2.1 Area / Power Improvements 17
4.2.2 Performance Improvements 18
โ
ค. Conclusion 19
References 20MasterdCollectio
Universal primers for rift valley fever virus whole-genome sequencing
Rift Valley fever (RVF) is a mosquito-borne zoonotic disease causing acute hemorrhagic fever. Accurate identification of mutations and phylogenetic characterization of RVF virus (RVFV) require whole-genome analysis. Universal primers to amplify the entire RVFV genome from clinical samples with low copy numbers are currently unavailable. Thus, we aimed to develop universal primers applicable for all known RVFV strains. Based on the genome sequences available from public databases, we designed eight pairs of universal PCR primers covering the entire RVFV genome. To evaluate primer universality, four RVFV strains (ZH548, Kenya 56 (IB8), BIME-01, and Lunyo), encompassing viral phylogenetic diversity, were chosen. The nucleic acids of the test strains were chemically synthesized or extracted via cell culture. These RNAs were evaluated using the PCR primers, resulting in successful amplification with expected sizes (0.8โ1.7ย kb). Sequencing confirmed that the products covered the entire genome of the RVFV strains tested. Primer specificity was confirmed via in silico comparison against all non-redundant nucleotide sequences using the BLASTn alignment tool in the NCBI database. To assess the clinical applicability of the primers, mock clinical specimens containing human and RVFV RNAs were prepared. The entire RVFV genome was successfully amplified and sequenced at a viral concentration of 108 copies/mL. Given the universality, specificity, and clinical applicability of the primers, we anticipate that the RVFV universal primer pairs and the developed method will aid in RVFV phylogenomics and mutation detection. ยฉ 2023, The Author(s).11Nsciescopu